In this exercise, we will use multi-species spatial and nonspatial
occupancy models to model the distributions of 10 passerine (songbird)
species across Switzerland. These data come from the Switzerland
Breeding Bird Survey (Swiss MHB) in 2014. Bird abundance data were
obtained at 267 1km squares across Switzerland and were simplified to
detection-nondetection data for occupancy modeling. The data come from
the Swiss Ornithological
Institute and were obtained via the R packages AHMbook and unmarked.
The covariate data come from the Swiss Federal
Statistical Office. More information on these data can be found in
the Applied
Hierarchical Modeling books
We first load spOccupancy as well as a few other
packages we will use for summarizing model output and generating
visualizations. We also set the seed so we can all get the same exact
results.
set.seed(500)
library(spOccupancy)
library(MCMCvis)
library(ggplot2)
library(pals)
library(sf)
# If not using the RStudio project, set working directory to the exercise-2 directory
# directory.
The data are stored in an R data file object called
swiss-mhb-2014-data.rda. We load this object below, which
reads in a list called data.swiss.mhb.
load("swiss-mhb-2014-data.rda")
# Check out the structure of the list
str(data.swiss.mhb)
List of 4
$ y : num [1:10, 1:266, 1:3] 0 1 0 0 1 0 0 1 0 1 ...
..- attr(*, "dimnames")=List of 3
.. ..$ sps : chr [1:10] "Eurasian Jay" "Blue Tit" "Coal Tit" "Winter Wren" ...
.. ..$ site: chr [1:266] "Q001" "Q002" "Q003" "Q004" ...
.. ..$ rep : chr [1:3] "1" "2" "3"
$ occ.covs:'data.frame': 266 obs. of 2 variables:
..$ elevation: int [1:266] 450 450 1050 950 1150 550 750 650 550 550 ...
..$ forest : int [1:266] 3 21 32 9 35 2 6 60 5 13 ...
$ det.covs:List of 2
..$ date: int [1:266, 1:3] 21 26 25 40 16 52 18 17 18 25 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:266] "Q001" "Q002" "Q003" "Q004" ...
.. .. ..$ : chr [1:3] "date141" "date142" "date143"
..$ dur : int [1:266, 1:3] 215 195 210 310 240 180 180 195 190 195 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:266] "Q001" "Q002" "Q003" "Q004" ...
.. .. ..$ : chr [1:3] "dur141" "dur142" "dur143"
$ coords : int [1:266, 1:2] 922942 928942 928942 934942 934942 946942 946942 952942 958942 958942 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:266] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "x" "y"
The data.swiss.mhb object is stored in the exact format
required for fitting multi-species occupancy models in
spOccupancy. The list is comprised of four objects, which
are identical to those we saw with single-species occupancy models in
Exercise 1, with the only change in the format of the
detection-nondetection data y:
y: the detection-nondetection data. This is a
three-dimensional array where the first dimension corresponds to
species, second dimension corresponds to sites, and the third dimension
corresponds to visits. Note that for imbalanced data sets where each
site may not have the same number of visits, NA values
should be placed in those site/visit combinations without any data.
spOccupancy assumes the same pattern of missing values
across all species. Here we are working with a set of 10 passerine
species that is a random subset of the species analyzed in Tobler et al. (2019).occ.covs: the covariates for use in the occupancy
portion of the model. This is a data frame or matrix of covariates, with
each row corresponding to a site, and each column corresponding to a
different variable. Here we see two covariates: elevation (the mean
elevation of the 1km cell) and percent forest cover within the 1km
cell.det.covs: the covariates for use in the detection
portion of the model. This is a list, where each element of the list
corresponds to a different covariate. Covariates on detection can be
either site-level or observation-level. For site-level covariates, they
should be specified as a vector with length equal to the total number of
sites in the data set. For observation-level covariates, they should be
specified as a matrix with rows corresponding to sites and columns
corresponding to visit. Here we have two observation-level covariates:
the Julian date of the survey and the duration in minutes the survey
took place.coords: the spatial coordinates of the sites. This is a
matrix with rows corresponding to sites and two columns corresponding to
the easting and northing coordinates of each given location. Note that
spOccupancy assumes the coordinates are in a projected
coordinate system (i.e., not latitude/longitude). The
coords component is only required for spatially-explicit
models in spOccupancy.Below we generate a simple plot that shows the total number of the potential 10 species detected at each of the 267 sites. We also generate a simple scatter plot showing the relationship between observed richness and the two occupancy covariates in our model (elevation and forest cover).
# 1 if a given species was ever observed across the 3 replicates, 0 if not
y.max.by.sp <- apply(data.swiss.mhb$y, c(1, 2), max, na.rm = TRUE)
# "Naive" species richness
raw.richness <- apply(y.max.by.sp, 2, sum)
plot.df <- data.frame(val = raw.richness,
x = data.swiss.mhb$coords[, 1],
y = data.swiss.mhb$coords[, 2])
plot.sf <- st_as_sf(plot.df,
coords = c('x', 'y'),
crs = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333
+k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel
+towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs")
# Map of observed species richness
ggplot() +
geom_sf(data = plot.sf, aes(col = val), size = 2) +
scale_color_viridis_c() +
theme_bw(base_size = 15) +
labs(x = "Longitude", y = "Latitude", col = "Observed\nRichness")
# Plot of forest cover vs. observed richness
plot(data.swiss.mhb$occ.covs$forest, raw.richness, pch = 19,
xlab = 'Forest Cover (%)', ylab = 'Observed Richness',
ylim = c(0, nrow(data.swiss.mhb$y)))
# Plot of elevation vs. observed richness
plot(data.swiss.mhb$occ.covs$elevation, raw.richness, pch = 19,
xlab = 'Elevation (%)', ylab = 'Observed Richness',
ylim = c(0, nrow(data.swiss.mhb$y)))
The function to fit a non-spatial multi-species occupancy model is
msPGOcc(). Here we will fit a multi-species occupancy model
with linear and quadratic effects of forest cover and elevation on
occupancy, as well as a linear effect of survey duration and linear and
quadartic effect of survey date on detection probability. As we did in
Exercise 1, we will use the scale() function when fitting
the model to standardize all covariates to have a mean of 0 and standard
deviation of 1.
The complete statistical model we will fit is shown below.
\[\begin{aligned} y_{i, j, k} &\sim \text{Bernoulli}(z_{i, j} \cdot p_{i, j, k}) \\ \text{logit}(p_{i, j, k}) &= \alpha_{0, i} + \alpha_{1, i} \cdot \text{DATE}_{j, k} + \alpha_{2, i} \cdot \text{DATE}^2_{j, k} + \alpha_{3, i} \cdot \text{DURATION}_{j, k} \\ z_{i, j} &\sim \text{Bernoulli}(\psi_{i, j}) \\ \text{logit}(\psi_{i, j}) &= \beta_{0, i} + \beta_{1, i} \cdot \text{FOREST}_j + \beta_{2, i} \cdot \text{FOREST}^2 + \beta_{3, i} \cdot \text{ELEV}_j + \beta_{4, i} \cdot \text{ELEV}^2 \\ \beta_{r, i} &\sim \text{Normal}(\mu_{\beta_r}, \tau^2_{\beta_r}) \\ \alpha_{r, i} &\sim \text{Normal}(\mu_{\alpha_r}, \tau^2_{\alpha_r}) \\ \mu_{\beta_r} &\sim \text{Normal}(0, 2.72) \\ \mu_{\alpha_r} &\sim \text{Normal}(0, 2.72) \\ \tau^2_{\beta_r} &\sim \text{Inverse-Gamma}(0.1, 0.1) \\ \tau^2_{\alpha_r} &\sim \text{Inverse-Gamma}(0.1, 0.1) \\ \end{aligned}\]# Approx run time: < 2 min
out.msPGOcc <- msPGOcc(occ.formula = ~ scale(forest) + I(scale(forest)^2) +
scale(elevation) + I(scale(elevation)^2),
det.formula = ~ scale(date) + I(scale(date)^2) +
scale(dur),
data = data.swiss.mhb,
n.samples = 10000,
n.thin = 5,
verbose = TRUE,
n.burn = 5000,
n.chains = 3,
n.report = 2000)
----------------------------------------
Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1
No prior specified for tau.sq.alpha.ig.
Setting prior shape to 0.1 and prior scale to 0.1
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
tau.sq.alpha is not specified in initial values.
Setting to initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
----------------------------------------
Model description
----------------------------------------
Multi-species Occupancy Model with Polya-Gamma latent
variable fit with 266 sites and 10 species.
Samples per Chain: 10000
Burn-in: 5000
Thinning Rate: 5
Number of Chains: 3
Total Posterior Samples: 3000
Source compiled with OpenMP support and model fit using 1 thread(s).
----------------------------------------
Chain 1
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
----------------------------------------
Chain 2
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
----------------------------------------
Chain 3
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
We see a lot of messages in the “Preparing to run the model” section
that tell us the default priors and initial values that are used when
fitting the model, since we did not explicitly specify them ourselves.
For multi-species occupancy models, we can set priors for the
community-level mean (beta.comm and
alpha.comm) and variance parameters
(tau.sq.beta and tau.sq.alpha). For the mean
parameters, we use normal (Gaussian) priors, while for the variance
parameters, we use inverse-Gamma priors.
# These are the default priors used in spOccupancy.
priors <- list(beta.comm.normal = list(mean = 0, var = 2.72),
alpha.comm.normal = list(mean = 0, var = 2.72),
tau.sq.beta.ig = list(a = 0.1, b = 0.1),
tau.sq.alpha.ig = list(a = 0.1, b = 0.1))
Initial values can also be specified for both species-level and community-level parameters. As with single-species models, the initial values are generally inconsequential for nonspatial multi-species models. For spatial models, specifying good initial values can sometimes be important for decreasing run time to convergence.
inits <- list(beta.comm = 0, alpha.comm = 0, beta = 0, alpha = 0,
tau.sq.beta = 1, tau.sq.alpha = 1,
z = apply(data.swiss.mhb$y, c(1, 2), max, na.rm = TRUE))
# Rerun the model with inits and priors explicitly specified.
out.msPGOcc <- msPGOcc(occ.formula = ~ scale(forest) + I(scale(forest)^2) +
scale(elevation) + I(scale(elevation)^2),
det.formula = ~ scale(date) + I(scale(date)^2) +
scale(dur),
data = data.swiss.mhb,
inits = inits,
priors = priors,
n.samples = 10000,
n.thin = 5,
verbose = TRUE,
n.burn = 5000,
n.chains = 3,
n.report = 2000)
----------------------------------------
Preparing to run the model
----------------------------------------
----------------------------------------
Model description
----------------------------------------
Multi-species Occupancy Model with Polya-Gamma latent
variable fit with 266 sites and 10 species.
Samples per Chain: 10000
Burn-in: 5000
Thinning Rate: 5
Number of Chains: 3
Total Posterior Samples: 3000
Source compiled with OpenMP support and model fit using 1 thread(s).
----------------------------------------
Chain 1
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
----------------------------------------
Chain 2
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
----------------------------------------
Chain 3
----------------------------------------
Sampling ...
Sampled: 2000 of 10000, 20.00%
-------------------------------------------------
Sampled: 4000 of 10000, 40.00%
-------------------------------------------------
Sampled: 6000 of 10000, 60.00%
-------------------------------------------------
Sampled: 8000 of 10000, 80.00%
-------------------------------------------------
Sampled: 10000 of 10000, 100.00%
As with all model types in spOccupancy, we can use the
summary() function to print a summary of the model results.
For multi-species occupancy models, the argument level
allows us to specify whether we want a summary for the community-level
parameters only (level = 'community'), the species-level
parameters only (level = 'species'), or both
(level = 'both').
# Quick summary of the model results at the community level.
summary(out.msPGOcc, level = 'community')
Call:
msPGOcc(occ.formula = ~scale(forest) + I(scale(forest)^2) + scale(elevation) +
I(scale(elevation)^2), det.formula = ~scale(date) + I(scale(date)^2) +
scale(dur), data = data.swiss.mhb, inits = inits, priors = priors,
n.samples = 10000, verbose = TRUE, n.report = 2000, n.burn = 5000,
n.thin = 5, n.chains = 3)
Samples per Chain: 10000
Burn-in: 5000
Thinning Rate: 5
Number of Chains: 3
Total Posterior Samples: 3000
Run Time (min): 1.6545
----------------------------------------
Community Level
----------------------------------------
Occurrence Means (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 1.1146 1.0387 -1.0275 1.1390 3.1449 1.0013 3000
scale(forest) 0.7719 0.5943 -0.4671 0.7844 1.9662 1.0012 3000
I(scale(forest)^2) -0.6082 0.2110 -1.0597 -0.5996 -0.2240 1.0164 1557
scale(elevation) -0.8839 0.6562 -2.1446 -0.9055 0.4992 1.0010 3000
I(scale(elevation)^2) -1.2194 0.2946 -1.7856 -1.2274 -0.6096 1.0029 1215
Occurrence Variances (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 18.8837 11.7486 6.4595 15.5351 49.8829 1.0025 1609
scale(forest) 3.8580 2.6796 1.1752 3.1758 11.1349 1.0074 947
I(scale(forest)^2) 0.2697 0.2672 0.0521 0.2000 0.8553 1.0113 1425
scale(elevation) 4.9715 3.2465 1.6911 4.1906 13.0628 1.0060 2139
I(scale(elevation)^2) 0.5906 0.5717 0.0728 0.4362 1.9781 1.0023 1022
Detection Means (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 1.8507 0.3948 1.0454 1.8581 2.6324 1.0005 3000
scale(date) 0.1542 0.1810 -0.2068 0.1503 0.5255 1.0036 2758
I(scale(date)^2) 0.0951 0.1334 -0.1522 0.0897 0.3752 1.0009 3002
scale(dur) 0.3076 0.1523 0.0020 0.3104 0.6181 1.0023 2652
Detection Variances (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 1.5663 1.1996 0.4613 1.2390 4.4811 1.0012 2352
scale(date) 0.2460 0.1987 0.0544 0.1932 0.7481 1.0053 2181
I(scale(date)^2) 0.1157 0.1030 0.0271 0.0879 0.3653 1.0051 2735
scale(dur) 0.1800 0.1295 0.0471 0.1441 0.5341 1.0049 2444
Looking at the community-level effects, we see occupancy on average has a significant negative quadratic relationship with both forest cover and elevation. This indicates that occupancy, on average, peaks at some value of elevation and forest cover and then subsequently decreases. This aligns with our exploratory data analysis (EDA) plots. Notice also substantial variation in the occupancy intercept and linear effects of forest cover and elevation across the community. We can look more closely at this variation by looking at the species-specific effects.
summary(out.msPGOcc, level = 'species')
Call:
msPGOcc(occ.formula = ~scale(forest) + I(scale(forest)^2) + scale(elevation) +
I(scale(elevation)^2), det.formula = ~scale(date) + I(scale(date)^2) +
scale(dur), data = data.swiss.mhb, inits = inits, priors = priors,
n.samples = 10000, verbose = TRUE, n.report = 2000, n.burn = 5000,
n.thin = 5, n.chains = 3)
Samples per Chain: 10000
Burn-in: 5000
Thinning Rate: 5
Number of Chains: 3
Total Posterior Samples: 3000
Run Time (min): 1.6545
----------------------------------------
Species Level
----------------------------------------
Occurrence (logit scale):
Mean SD 2.5% 50% 97.5%
(Intercept)-Eurasian Jay 2.4715 0.4407 1.6646 2.4523 3.4129
(Intercept)-Blue Tit 1.4705 0.3504 0.7960 1.4801 2.1718
(Intercept)-Coal Tit 5.7669 0.8308 4.2879 5.7030 7.5926
(Intercept)-Winter Wren 7.0102 1.0508 5.1934 6.9138 9.3430
(Intercept)-European Robin 7.2109 1.1943 5.2462 7.0932 9.8327
(Intercept)-Common Redstart -0.4853 0.2637 -0.9932 -0.4928 0.0470
(Intercept)-Whinchat -1.5914 0.3664 -2.3723 -1.5744 -0.9257
(Intercept)-House Sparrow 0.1211 0.3204 -0.5209 0.1246 0.7403
(Intercept)-Italian Sparrow -4.7140 0.9434 -6.7360 -4.6296 -3.0403
(Intercept)-European Goldfinch 0.9328 0.2777 0.3911 0.9307 1.4861
scale(forest)-Eurasian Jay 1.8057 0.3554 1.1682 1.7937 2.5510
scale(forest)-Blue Tit 0.4402 0.3901 -0.3470 0.4498 1.2054
scale(forest)-Coal Tit 2.3366 0.4676 1.5570 2.2800 3.3946
scale(forest)-Winter Wren 1.9356 0.6389 0.9180 1.8601 3.4657
scale(forest)-European Robin 3.8038 1.0836 2.1237 3.6668 6.2925
scale(forest)-Common Redstart 0.1748 0.2253 -0.2565 0.1735 0.6355
scale(forest)-Whinchat -1.1290 0.3735 -1.9031 -1.1138 -0.4304
scale(forest)-House Sparrow -1.4773 0.3076 -2.1211 -1.4614 -0.9031
scale(forest)-Italian Sparrow 1.5221 0.6603 0.3284 1.4802 2.9490
scale(forest)-European Goldfinch -0.6658 0.2172 -1.0903 -0.6629 -0.2592
I(scale(forest)^2)-Eurasian Jay -0.5481 0.2595 -1.0683 -0.5502 -0.0218
I(scale(forest)^2)-Blue Tit -0.4765 0.2432 -0.9629 -0.4765 -0.0032
I(scale(forest)^2)-Coal Tit -0.8385 0.3340 -1.5020 -0.8436 -0.1606
I(scale(forest)^2)-Winter Wren -1.0253 0.3973 -1.8680 -0.9963 -0.2956
I(scale(forest)^2)-European Robin -0.9823 0.4841 -1.9614 -0.9551 -0.1041
I(scale(forest)^2)-Common Redstart 0.0121 0.1716 -0.3261 0.0098 0.3450
I(scale(forest)^2)-Whinchat -0.6089 0.3187 -1.2622 -0.6011 -0.0194
I(scale(forest)^2)-House Sparrow -0.4655 0.2762 -1.0255 -0.4588 0.0784
I(scale(forest)^2)-Italian Sparrow -0.8174 0.4086 -1.7523 -0.7879 -0.1125
I(scale(forest)^2)-European Goldfinch -0.4031 0.1866 -0.7746 -0.3990 -0.0480
scale(elevation)-Eurasian Jay -1.3878 0.2980 -2.0314 -1.3671 -0.8561
scale(elevation)-Blue Tit -3.8030 0.5234 -4.9085 -3.7734 -2.8939
scale(elevation)-Coal Tit 0.5711 0.2430 0.1062 0.5657 1.0575
scale(elevation)-Winter Wren 0.4264 0.2726 -0.0989 0.4200 0.9706
scale(elevation)-European Robin -0.8251 0.2794 -1.3805 -0.8168 -0.2814
scale(elevation)-Common Redstart -0.0343 0.2046 -0.4410 -0.0270 0.3597
scale(elevation)-Whinchat 2.3630 0.5351 1.4598 2.3202 3.5439
scale(elevation)-House Sparrow -3.3720 0.4939 -4.4104 -3.3451 -2.4938
scale(elevation)-Italian Sparrow -2.8037 1.1181 -5.1395 -2.7481 -0.8747
scale(elevation)-European Goldfinch -1.3017 0.2188 -1.7490 -1.2969 -0.8817
I(scale(elevation)^2)-Eurasian Jay -0.9740 0.3514 -1.6467 -0.9855 -0.2696
I(scale(elevation)^2)-Blue Tit -0.3933 0.5796 -1.4001 -0.4212 0.8573
I(scale(elevation)^2)-Coal Tit -2.2128 0.4797 -3.2313 -2.1782 -1.3990
I(scale(elevation)^2)-Winter Wren -1.6410 0.3906 -2.4390 -1.6146 -0.9466
I(scale(elevation)^2)-European Robin -1.3499 0.4559 -2.3021 -1.3320 -0.4935
I(scale(elevation)^2)-Common Redstart -0.9185 0.2700 -1.4348 -0.9159 -0.4029
I(scale(elevation)^2)-Whinchat -1.5063 0.3909 -2.3513 -1.4881 -0.7815
I(scale(elevation)^2)-House Sparrow -1.4685 0.4677 -2.4567 -1.4442 -0.6002
I(scale(elevation)^2)-Italian Sparrow -0.7652 0.6991 -2.0326 -0.8252 0.7330
I(scale(elevation)^2)-European Goldfinch -1.1817 0.2926 -1.7694 -1.1731 -0.6342
Rhat ESS
(Intercept)-Eurasian Jay 1.0017 1452
(Intercept)-Blue Tit 1.0024 2460
(Intercept)-Coal Tit 1.0225 834
(Intercept)-Winter Wren 1.0094 573
(Intercept)-European Robin 1.0078 260
(Intercept)-Common Redstart 1.0008 2697
(Intercept)-Whinchat 1.0044 1692
(Intercept)-House Sparrow 1.0056 3000
(Intercept)-Italian Sparrow 1.0112 344
(Intercept)-European Goldfinch 1.0020 2815
scale(forest)-Eurasian Jay 1.0027 1385
scale(forest)-Blue Tit 1.0028 1759
scale(forest)-Coal Tit 1.0090 700
scale(forest)-Winter Wren 1.0553 463
scale(forest)-European Robin 1.0145 216
scale(forest)-Common Redstart 1.0054 2635
scale(forest)-Whinchat 1.0029 1846
scale(forest)-House Sparrow 1.0112 2411
scale(forest)-Italian Sparrow 1.0069 740
scale(forest)-European Goldfinch 1.0029 2833
I(scale(forest)^2)-Eurasian Jay 1.0086 2307
I(scale(forest)^2)-Blue Tit 1.0035 2334
I(scale(forest)^2)-Coal Tit 1.0034 1389
I(scale(forest)^2)-Winter Wren 1.0096 1029
I(scale(forest)^2)-European Robin 1.0004 676
I(scale(forest)^2)-Common Redstart 1.0014 2582
I(scale(forest)^2)-Whinchat 1.0036 1329
I(scale(forest)^2)-House Sparrow 0.9999 2349
I(scale(forest)^2)-Italian Sparrow 1.0006 1228
I(scale(forest)^2)-European Goldfinch 0.9997 3083
scale(elevation)-Eurasian Jay 1.0066 1521
scale(elevation)-Blue Tit 1.0155 946
scale(elevation)-Coal Tit 1.0028 2142
scale(elevation)-Winter Wren 1.0001 1715
scale(elevation)-European Robin 0.9999 2497
scale(elevation)-Common Redstart 1.0014 3000
scale(elevation)-Whinchat 1.0103 747
scale(elevation)-House Sparrow 1.0046 803
scale(elevation)-Italian Sparrow 1.0099 208
scale(elevation)-European Goldfinch 0.9996 2078
I(scale(elevation)^2)-Eurasian Jay 1.0001 1663
I(scale(elevation)^2)-Blue Tit 1.0056 795
I(scale(elevation)^2)-Coal Tit 1.0300 1006
I(scale(elevation)^2)-Winter Wren 1.0028 1792
I(scale(elevation)^2)-European Robin 0.9997 1824
I(scale(elevation)^2)-Common Redstart 1.0010 2281
I(scale(elevation)^2)-Whinchat 1.0040 1214
I(scale(elevation)^2)-House Sparrow 1.0100 936
I(scale(elevation)^2)-Italian Sparrow 1.0060 340
I(scale(elevation)^2)-European Goldfinch 1.0022 2267
Detection (logit scale):
Mean SD 2.5% 50% 97.5%
(Intercept)-Eurasian Jay 1.0063 0.1406 0.7303 1.0059 1.2868
(Intercept)-Blue Tit 1.6863 0.1703 1.3520 1.6865 2.0208
(Intercept)-Coal Tit 2.7840 0.2280 2.3573 2.7743 3.2386
(Intercept)-Winter Wren 3.2582 0.2577 2.7627 3.2489 3.7842
(Intercept)-European Robin 2.7993 0.2249 2.3676 2.7967 3.2436
(Intercept)-Common Redstart 0.8451 0.2297 0.4007 0.8406 1.2974
(Intercept)-Whinchat 0.8284 0.3716 0.1114 0.8318 1.5401
(Intercept)-House Sparrow 2.3348 0.2524 1.8637 2.3292 2.8352
(Intercept)-Italian Sparrow 3.2472 0.9878 1.6046 3.1507 5.4514
(Intercept)-European Goldfinch 0.8506 0.1643 0.5311 0.8471 1.1834
scale(date)-Eurasian Jay 0.0463 0.1149 -0.1703 0.0439 0.2731
scale(date)-Blue Tit -0.3050 0.1831 -0.6578 -0.3039 0.0448
scale(date)-Coal Tit 0.5225 0.2226 0.1197 0.5110 1.0029
scale(date)-Winter Wren -0.2580 0.1800 -0.6156 -0.2608 0.0963
scale(date)-European Robin 0.1236 0.2049 -0.2709 0.1182 0.5433
scale(date)-Common Redstart -0.0371 0.1701 -0.3727 -0.0366 0.2967
scale(date)-Whinchat 0.7601 0.3477 0.1527 0.7334 1.5112
scale(date)-House Sparrow 0.3522 0.2825 -0.1639 0.3396 0.9380
scale(date)-Italian Sparrow 0.0381 0.4842 -0.9671 0.0651 0.9485
scale(date)-European Goldfinch 0.3367 0.1693 0.0142 0.3350 0.6837
I(scale(date)^2)-Eurasian Jay -0.0175 0.1072 -0.2255 -0.0154 0.1967
I(scale(date)^2)-Blue Tit 0.2730 0.1718 -0.0484 0.2671 0.6240
I(scale(date)^2)-Coal Tit 0.1956 0.1797 -0.1341 0.1898 0.5749
I(scale(date)^2)-Winter Wren -0.0182 0.1451 -0.2996 -0.0194 0.2720
I(scale(date)^2)-European Robin 0.2694 0.1866 -0.0700 0.2558 0.6608
I(scale(date)^2)-Common Redstart -0.2122 0.1593 -0.5247 -0.2086 0.1008
I(scale(date)^2)-Whinchat -0.1188 0.2433 -0.5909 -0.1177 0.3419
I(scale(date)^2)-House Sparrow 0.2212 0.2283 -0.1923 0.2090 0.6942
I(scale(date)^2)-Italian Sparrow 0.2095 0.3512 -0.4257 0.1833 0.9957
I(scale(date)^2)-European Goldfinch 0.1433 0.1451 -0.1396 0.1383 0.4362
scale(dur)-Eurasian Jay 0.2071 0.1183 -0.0241 0.2082 0.4430
scale(dur)-Blue Tit -0.0500 0.1405 -0.3296 -0.0478 0.2211
scale(dur)-Coal Tit 0.6856 0.1834 0.3387 0.6831 1.0481
scale(dur)-Winter Wren 0.7702 0.1965 0.3931 0.7687 1.1615
scale(dur)-European Robin 0.4891 0.1795 0.1594 0.4876 0.8621
scale(dur)-Common Redstart 0.2438 0.1814 -0.1141 0.2395 0.6017
scale(dur)-Whinchat 0.3087 0.2554 -0.1783 0.3026 0.8328
scale(dur)-House Sparrow -0.1052 0.1849 -0.4777 -0.1002 0.2508
scale(dur)-Italian Sparrow 0.2972 0.4110 -0.5435 0.3025 1.1304
scale(dur)-European Goldfinch 0.1727 0.1254 -0.0786 0.1733 0.4178
Rhat ESS
(Intercept)-Eurasian Jay 1.0024 2409
(Intercept)-Blue Tit 0.9998 3000
(Intercept)-Coal Tit 1.0102 1976
(Intercept)-Winter Wren 1.0095 1663
(Intercept)-European Robin 1.0008 2316
(Intercept)-Common Redstart 1.0018 3000
(Intercept)-Whinchat 1.0051 3000
(Intercept)-House Sparrow 0.9995 2552
(Intercept)-Italian Sparrow 1.0033 2260
(Intercept)-European Goldfinch 0.9998 3000
scale(date)-Eurasian Jay 0.9999 4529
scale(date)-Blue Tit 1.0028 3111
scale(date)-Coal Tit 1.0003 1803
scale(date)-Winter Wren 1.0001 2366
scale(date)-European Robin 1.0046 2101
scale(date)-Common Redstart 1.0058 3000
scale(date)-Whinchat 1.0005 2151
scale(date)-House Sparrow 1.0013 2573
scale(date)-Italian Sparrow 1.0001 3000
scale(date)-European Goldfinch 1.0024 2979
I(scale(date)^2)-Eurasian Jay 1.0014 3183
I(scale(date)^2)-Blue Tit 1.0021 2914
I(scale(date)^2)-Coal Tit 1.0014 2033
I(scale(date)^2)-Winter Wren 1.0037 2480
I(scale(date)^2)-European Robin 1.0080 2026
I(scale(date)^2)-Common Redstart 1.0035 3152
I(scale(date)^2)-Whinchat 1.0000 2785
I(scale(date)^2)-House Sparrow 0.9997 2661
I(scale(date)^2)-Italian Sparrow 1.0014 3000
I(scale(date)^2)-European Goldfinch 1.0051 2806
scale(dur)-Eurasian Jay 1.0002 2530
scale(dur)-Blue Tit 1.0009 3000
scale(dur)-Coal Tit 1.0093 1995
scale(dur)-Winter Wren 1.0036 1575
scale(dur)-European Robin 1.0006 2234
scale(dur)-Common Redstart 1.0058 3000
scale(dur)-Whinchat 1.0007 3000
scale(dur)-House Sparrow 1.0025 2612
scale(dur)-Italian Sparrow 1.0091 3000
scale(dur)-European Goldfinch 1.0008 3000
Convergence and mixing of all model parameters appears to be adequate based on the Rhat values and ESS values. This can be further explored with traceplots.
# Traceplot for community-level occurrence effects
plot(out.msPGOcc, param = 'beta.comm', density = FALSE)
A perhaps more easily readable summary of the species-specific
effects can be generated graphically using the MCMCvis
package (Youngflesh 2018).
# Linear and quadratic effects of elevation
MCMCplot(out.msPGOcc$beta.samples, ref_ovl = TRUE, params = 'elevation',
exact = FALSE, main = 'Elevation')
We see every species has a negative quadratic relationship with
elevation, yet there is large variability in the linear effect. This
indicates each species has some “optimal” elevation at which occupancy
probability is the highest, but that this peak is variable across the
different species. We can confirm this by using the
predict() function to predict occupancy probability across
a gradient of elevation (while holding forest cover at its mean
value).
# Create a set of values across the range of observed elevation values
elevation.pred.vals <- seq(min(data.swiss.mhb$occ.covs$elevation),
max(data.swiss.mhb$occ.covs$elevation),
length.out = 100)
# Scale predicted values by mean and standard deviation used to fit the model
elevation.pred.vals.scale <- (elevation.pred.vals - mean(data.swiss.mhb$occ.covs$elevation)) /
sd(data.swiss.mhb$occ.covs$elevation)
# Create design matrix (matrix to hold all the covariates, including the intercept)
# Number of occurrence parameters for each species
p.occ <- ncol(out.msPGOcc$beta.comm.samples)
X.0 <- matrix(1, nrow = length(elevation.pred.vals), ncol = p.occ)
(colnames(X.0) <- colnames(out.msPGOcc$beta.comm.samples))
[1] "(Intercept)" "scale(forest)" "I(scale(forest)^2)"
[4] "scale(elevation)" "I(scale(elevation)^2)"
X.0[, 'scale(forest)'] <- 0
X.0[, 'I(scale(forest)^2)'] <- 0
X.0[, 'scale(elevation)'] <- elevation.pred.vals.scale
X.0[, 'I(scale(elevation)^2)'] <- elevation.pred.vals.scale^2
head(X.0)
(Intercept) scale(forest) I(scale(forest)^2) scale(elevation)
[1,] 1 0 0 -1.468424
[2,] 1 0 0 -1.429032
[3,] 1 0 0 -1.389641
[4,] 1 0 0 -1.350249
[5,] 1 0 0 -1.310858
[6,] 1 0 0 -1.271466
I(scale(elevation)^2)
[1,] 2.156269
[2,] 2.042134
[3,] 1.931102
[4,] 1.823173
[5,] 1.718348
[6,] 1.616626
# Predict occupancy
out.pred <- predict(out.msPGOcc, X.0)
str(out.pred)
List of 3
$ psi.0.samples: num [1:3000, 1:10, 1:100] 0.935 0.912 0.787 0.71 0.827 ...
$ z.0.samples : int [1:3000, 1:10, 1:100] 1 0 1 1 0 1 1 1 1 1 ...
$ call : language predict.msPGOcc(object = out.msPGOcc, X.0 = X.0)
- attr(*, "class")= chr "predict.msPGOcc"
# Extract quantiles from the posterior samples
psi.0.quants <- apply(out.pred$psi.0.samples, c(2, 3), quantile,
prob = c(0.025, 0.5, 0.975))
# Species names
sp.names <- dimnames(data.swiss.mhb$y)[[1]]
# Number of species
N <- length(sp.names)
# Put in a data frame for ggplot
psi.plot.dat <- data.frame(psi.med = c(psi.0.quants[2, , ]),
psi.low = c(psi.0.quants[1, , ]),
psi.high = c(psi.0.quants[3, , ]),
elevation = rep(elevation.pred.vals, each = N),
sp = rep(sp.names, length(elevation.pred.vals)))
ggplot(psi.plot.dat, aes(x = elevation, y = psi.med)) +
geom_ribbon(aes(ymin = psi.low, ymax = psi.high), fill = 'grey70') +
geom_line() +
theme_bw() +
scale_y_continuous(limits = c(0, 1)) +
facet_wrap(vars(sp)) +
labs(x = 'Elevation (m)', y = 'Occupancy Probability')
Our intuition regarding the relationship between elevation and the 10 different species is confirmed with the plots. We see a few very common species that occur at essentially all elevations with the exception of the highest elevation areas. We also see a fairly rare species, the Italian Sparrow, which only occurs at low probability at low elevation sites. These results make sense, as the Italian Sparrow only occurs along the Italian Peninsula.
Now, suppose our goal is to create distribution maps of the 10 species across Switzerland. Given results from previous analyses in Switzerland (Guélat and Kéry 2018), we have reason to believe a spatially-explicit model will provide more accurate predictions compared to a non-spatial model. Below we fit a spatially-explicit multi-species occupancy model using a spatial factor modeling approach, the details of which are discussed in Doser, Finley, and Banerjee (2023). As discussed in the multi-species occupancy modeling lecture, in this approach we model spatial autocorrelation for each species in the community by estimating a small set of “missing covariates” that are assigned a spatial structure, and then simultaneously estimating the species-specific effects of these “missing covariates” for each species. Here we choose to include 2 “missing covariates” (i.e., factors). Figuring out how to choose the number of factors to include can be a bit tricky. We provide a set of guidelines on the package website for how to best make this decision.
Notice that below we make two decisions when running the model that are done solely to allow us to run this example in a short amount of time: (1) we use 5 neighbors in the NNGP approximation; and (2) we only run a single MCMC chain. For a complete analysis, we would run this model for three chains with 15 neighbors and assess convergence with the Rhat diagnostic. If we wanted to see if 5 neighbors were adequate to model the spatial autocorrelation, we could then compare a model using 5 neighbors with 15 neighbors using WAIC to see if there is any substantial improvement.
# Using two spatial factors ("missing covariates" to explain spatial autocorrelation)
n.factors <- 2
# Number of neighbors for NNGP
n.neighbors <- 5
# Number of chains
n.chains <- 1
Just like we saw with the single-species spatial occupancy models,
spatial multi-species occupancy models also require us to specify the
number of MCMC samples in terms of a set of batches
(n.batch) each with some pre-specified number of samples
per batch (batch.length). Below we will run the model for
800 batches, each of length 25, which results in a total of
n.batch * batch.length MCMC samples for our single chain
(20,000 samples). We will discard 10,000 samples as burn-in and use a
thinning rate of 10, which will result in a total of 1000 MCMC samples
from our single chain.
n.batch <- 800
batch.length <- 25
n.thin <- 10
n.burn <- 10000
# Approx run time: < 3 min
out.sfMsPGOcc <- sfMsPGOcc(occ.formula = ~ scale(forest) + I(scale(forest)^2) +
scale(elevation) + I(scale(elevation)^2),
det.formula = ~ scale(date) + I(scale(date)^2) +
scale(dur),
data = data.swiss.mhb,
n.batch = n.batch,
batch.length = batch.length,
n.neighbors = n.neighbors,
n.thin = n.thin,
verbose = TRUE,
n.factors = n.factors,
n.burn = n.burn,
n.chains = n.chains,
n.report = 200)
----------------------------------------
Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.
Using an inverse-Gamma prior with prior shape 0.1 and prior scale 0.1
No prior specified for tau.sq.alpha.
Using an inverse-Gamma prior with prior shape 0.1 and prior scale 0.1
No prior specified for phi.unif.
Setting uniform bounds based on the range of observed spatial coordinates.
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
tau.sq.alpha is not specified in initial values.
Setting initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
phi is not specified in initial values.
Setting initial value to random values from the prior distribution
lambda is not specified in initial values.
Setting initial values of the lower triangle to 0
----------------------------------------
Building the neighbor list
----------------------------------------
----------------------------------------
Building the neighbors of neighbors list
----------------------------------------
----------------------------------------
Model description
----------------------------------------
Spatial Factor NNGP Multi-species Occupancy Model with Polya-Gamma latent
variable fit with 266 sites and 10 species.
Samples per chain: 20000 (800 batches of length 25)
Burn-in: 10000
Thinning Rate: 10
Number of Chains: 1
Total Posterior Samples: 1000
Using the exponential spatial correlation model.
Using 2 latent spatial factors.
Using 5 nearest neighbors.
Source compiled with OpenMP support and model fit using 1 thread(s).
Adaptive Metropolis with target acceptance rate: 43.0
----------------------------------------
Chain 1
----------------------------------------
Sampling ...
Batch: 200 of 800, 25.00%
Latent Factor Parameter Acceptance Tuning
1 phi 44.0 0.62500
2 phi 24.0 0.47237
-------------------------------------------------
Batch: 400 of 800, 50.00%
Latent Factor Parameter Acceptance Tuning
1 phi 44.0 0.50158
2 phi 84.0 0.54335
-------------------------------------------------
Batch: 600 of 800, 75.00%
Latent Factor Parameter Acceptance Tuning
1 phi 60.0 0.63763
2 phi 20.0 0.55433
-------------------------------------------------
Batch: 800 of 800, 100.00%
We can also explicitly set the priors distributions and initial
values for all model parameters if desired. Note that a standard normal
distribution is used for the factor loadings (i.e., the effects of the
“missing covariates”), which is a requirement for fitting these models
in spOccupancy. We do not run the below code chunk, but
show how prior distributions and initial values can be specified and
supplied as arguments to sfMsPGOcc().
# Calculate distance matrix for prior on spatial decay parameters
dist.mat <- dist(data.swiss.mhb$coords)
# Prior distributions
priors <- list(beta.comm.normal = list(mean = 0, var = 2.72),
alpha.comm.normal = list(mean = 0, var = 2.72),
tau.sq.beta.ig = list(a = 0.1, b = 0.1),
tau.sq.alpha.ig = list(a = 0.1, b = 0.1),
phi.unif = list(a = 3 / max(dist.mat), b = 3 / min(dist.mat)))
# Number of species
N <- nrow(data.swiss.mhb$y)
# Initial values for the factor loadings
lambda.inits <- matrix(0, N, n.factors)
diag(lambda.inits) <- 1
inits <- list(beta.comm = 0, alpha.comm = 0, tau.sq.beta = 1,
tau.sq.alpha = 1, beta = 0, alpha = 0,
lambda = lambda.inits,
z = apply(data.swiss.mhb$y, c(1, 2), max, na.rm = TRUE))
# Not run
out.sfMsPGOcc <- sfMsPGOcc(occ.formula = ~ scale(forest) + I(scale(forest)^2) +
scale(elevation) + I(scale(elevation)^2),
det.formula = ~ scale(date) + I(scale(date)^2) +
scale(dur),
data = data.swiss.mhb,
n.batch = n.batch,
priors = priors,
inits = inits,
batch.length = batch.length,
n.neighbors = n.neighbors,
n.thin = n.thin,
verbose = TRUE,
n.factors = n.factors,
n.burn = n.burn,
n.chains = n.chains,
n.report = 200)
We next compare the spatial model to the non-spatial model using WAIC
to assess any improvements in model fit. Note that for multi-species
model, we can calculate WAIC for the overall community of species, as
well as for each individual species. We do this using the
by.sp argument.
# Default is one overall WAIC for the community
waicOcc(out.msPGOcc)
elpd pD WAIC
-1975.25324 78.26685 4107.04017
waicOcc(out.sfMsPGOcc)
elpd pD WAIC
-1802.9994 175.6065 3957.2118
# Set by.sp = TRUE for species-specific WAIC values
waicOcc(out.msPGOcc, by.sp = TRUE)
elpd pD WAIC
1 -390.19163 9.270757 798.92478
2 -238.07093 8.497950 493.13776
3 -177.00011 7.967000 369.93422
4 -153.68403 9.454696 326.27746
5 -157.19016 5.941142 326.26260
6 -240.72010 9.276270 499.99274
7 -93.24857 7.138361 200.77386
8 -166.62461 8.215219 349.67966
9 -25.57159 3.595598 58.33437
10 -332.95151 8.909852 683.72273
waicOcc(out.sfMsPGOcc, by.sp = TRUE)
elpd pD WAIC
1 -381.234988 14.865365 792.2007
2 -221.208093 20.124554 482.6653
3 -172.084205 12.700058 369.5685
4 -149.279719 14.189343 326.9381
5 -154.803173 9.505830 328.6180
6 -186.699102 33.509890 440.4180
7 -72.915399 15.919727 177.6703
8 -152.729520 19.633988 344.7270
9 -8.953442 4.490507 26.8879
10 -303.091792 30.667216 667.5180
Here we see the spatial model substantially outperforms the nonspatial model. When looking individually at each species, we see there is clear variation in the amount of improvement the spatial model provides. Most species appear to have substantial improvements in the spatial model, but the third, fourth, and fifth species (Coal Tit, Winter Wren, and European Robin) do not show any improvement.
Let’s take a look at the estimated “factor loadings”, which are the species-specific effects of the “missing covariates” (the spatial factors) in the model that are used to account for spatial autocorrelation.
MCMCplot(out.sfMsPGOcc$lambda.samples, ref_ovl = TRUE, main = 'Factor Loadings')
Notice the three values that are fixed, which is necessary to be able to estimate the model. We see a few species with significant positive effects of the first factor as well as the second factor. Notice that the three species that did not have improved model fit with the spatial model (Coal Tit, Winter Wren, European Robin) all have non-significant effects. This makes intuitive sense. The factor loadings represent the “effects” of the “missing covariates” that are used to explain any residual spatial autocorrelation. If the values of the factor loadings are all very close to 0, that means the “effect” of the factors is essentially 0, and so we would expect a model without the factors to show similar model performance for such species. When we generate predictions across Switzerland, we can look at the estimated factor loadings (species-specific effects) together with a map of the predicted spatial factor (“missing covariate”) across the country to see if we can glean any sort of information from this assessment. We will return to this shortly.
Now knowing our spatial model outperforms the top performing model let’s do a posterior predictive check to assess the fit of our model.
# Perform a posterior predictive check to assess model fit.
ppc.out <- ppcOcc(out.sfMsPGOcc, fit.stat = 'freeman-tukey', group = 1)
Currently on species 1 out of 10
Currently on species 2 out of 10
Currently on species 3 out of 10
Currently on species 4 out of 10
Currently on species 5 out of 10
Currently on species 6 out of 10
Currently on species 7 out of 10
Currently on species 8 out of 10
Currently on species 9 out of 10
Currently on species 10 out of 10
# Calculate a Bayesian p-value as a simple measure of Goodness of Fit.
summary(ppc.out)
Call:
ppcOcc(object = out.sfMsPGOcc, fit.stat = "freeman-tukey", group = 1)
Samples per Chain: 20000
Burn-in: 10000
Thinning Rate: 10
Number of Chains: 1
Total Posterior Samples: 1000
----------------------------------------
Community Level
----------------------------------------
Bayesian p-value: 0.1938
----------------------------------------
Species Level
----------------------------------------
Eurasian Jay Bayesian p-value: 0.229
Blue Tit Bayesian p-value: 0.085
Coal Tit Bayesian p-value: 0.051
Winter Wren Bayesian p-value: 0.095
European Robin Bayesian p-value: 0.089
Common Redstart Bayesian p-value: 0.248
Whinchat Bayesian p-value: 0.387
House Sparrow Bayesian p-value: 0.057
Italian Sparrow Bayesian p-value: 0.509
European Goldfinch Bayesian p-value: 0.188
Fit statistic: freeman-tukey
Overall across all species, our Bayesian p-value is about 0.2,
suggesting adequate model fit. However, when we look at the individual
p-values for specific species, we see there are a few species with
inadequate model fits. We can explore this further by looking more
closely at the ppc.out object. The
ppc.out$fit.y and ppc.out$fit.y.rep contains
the estimated fit statistic (in this case the Freeman-Tukey statistic)
calculated across all data points individually for each species and MCMC
sample for the true data and the replicated data, respectively. Plotting
these two components in a scatter plot can give insight into how well
the model fits. In the below plot, the blue points are those for which
the true data has a larger fit statistic value compared to the replicate
data, while the red points are those where the value for the fitted data
is larger than the true data.
str(ppc.out)
List of 14
$ fit.y : num [1:1000, 1:10] 22 23.7 26.2 23.9 21.8 ...
$ fit.y.rep : num [1:1000, 1:10] 15 18 15.1 23.7 15.1 ...
$ fit.y.group.quants : num [1:5, 1:10, 1:266] 7.59e-05 1.31e-03 2.66e-03 4.43e-03 8.79e-03 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:5] "2.5%" "25%" "50%" "75%" ...
.. ..$ : NULL
.. ..$ : NULL
$ fit.y.rep.group.quants: num [1:5, 1:10, 1:266] 0.000312 0.003215 0.061401 0.086919 1.985947 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:5] "2.5%" "25%" "50%" "75%" ...
.. ..$ : NULL
.. ..$ : NULL
$ group : num 1
$ fit.stat : chr "freeman-tukey"
$ class : chr "sfMsPGOcc"
$ call : language ppcOcc(object = out.sfMsPGOcc, fit.stat = "freeman-tukey", group = 1)
$ n.samples : num 20000
$ n.burn : num 10000
$ n.thin : num 10
$ n.post : int 1000
$ n.chains : num 1
$ sp.names : chr [1:10] "Eurasian Jay" "Blue Tit" "Coal Tit" "Winter Wren" ...
- attr(*, "class")= chr "ppcOcc"
# Generate a plot for all species
ppc.df <- data.frame(fit = c(ppc.out$fit.y),
fit.rep = c(ppc.out$fit.y.rep),
color = 'lightskyblue1')
ppc.df$color[ppc.df$fit.rep > ppc.df$fit] <- 'lightsalmon'
plot(ppc.df$fit, ppc.df$fit.rep, bg = ppc.df$color, pch = 21,
ylab = 'Replicate Data Fit Statistic', xlab = 'True Data Fit Statistic',
main = 'All species')
lines(ppc.df$fit, ppc.df$fit, col = 'black')
The above figure makes the Bayesian p-value clear; it is the proportion of the points that fall above the one to one line (or alternatively, the proportion of points that are red). We could also modify the above code to generate a plot for one species at a time, which could help reveal the reasons behind the inadequate model fit for some species. We leave that to you to explore.
Lastly, we will generate a distribution map for each of the 10
species across Switzerland. We do this by using the
predict() function along with all the covariates used for
the model. We read in the prediction object below.
# Predict occupancy probability and species richness across Switzerland
# Load prediction objects (loads objects pred.swiss and coords.0)
load("switzerlandPredData.rda")
str(pred.swiss)
num [1:42275, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "intercept" "elevation" "forest"
Prediction objects for multi-species models in
spOccupancy can be quite large, particularly when
predicting across a large number of locations (in this case 42,275).
Here we will subset the complete grid of 1x1km points across Switzerland
to predict at 8000 random locations across the country.
# Set seed so we get the same values.
set.seed(1000)
# Get the indices for the new locations (remove the locations where we have data)
new.site.indx <- which(is.na(match(do.call("paste", as.data.frame(coords.0)),
do.call("paste", as.data.frame(data.swiss.mhb$coords)))))
pred.indx <- sample(new.site.indx, 8000, replace = FALSE)
# Subset the prediction objects
pred.swiss <- pred.swiss[pred.indx, ]
coords.0 <- coords.0[pred.indx, ]
# Look to see how our points are distributed
plot(coords.0, pch = 19)
# Standardize elevation and forest prediction values by values used to fit model
elevation.0 <- (pred.swiss[, 'elevation'] - mean(data.swiss.mhb$occ.covs$elevation)) /
sd(data.swiss.mhb$occ.covs$elevation)
forest.0 <- (pred.swiss[, 'forest'] - mean(data.swiss.mhb$occ.covs$forest)) /
sd(data.swiss.mhb$occ.covs$forest)
# Create prediction design matrix
X.0 <- cbind(1, forest.0, forest.0^2, elevation.0, elevation.0^2)
(colnames(X.0) <- colnames(out.sfMsPGOcc$beta.comm.samples))
[1] "(Intercept)" "scale(forest)" "I(scale(forest)^2)"
[4] "scale(elevation)" "I(scale(elevation)^2)"
# Predict at new locations
out.pred <- predict(out.sfMsPGOcc, X.0, coords.0)
----------------------------------------
Prediction description
----------------------------------------
Spatial Factor NNGP Multispecies Occupancy model with Polya-Gamma latent
variable fit with 266 observations.
Number of covariates 5 (including intercept if specified).
Using the exponential spatial correlation model.
Using 5 nearest neighbors.
Using 2 latent spatial factors.
Number of MCMC samples 1000.
Predicting at 8000 non-sampled locations.
Source compiled with OpenMP support and model fit using 1 threads.
-------------------------------------------------
Predicting
-------------------------------------------------
Location: 100 of 8000, 1.25%
Location: 200 of 8000, 2.50%
Location: 300 of 8000, 3.75%
Location: 400 of 8000, 5.00%
Location: 500 of 8000, 6.25%
Location: 600 of 8000, 7.50%
Location: 700 of 8000, 8.75%
Location: 800 of 8000, 10.00%
Location: 900 of 8000, 11.25%
Location: 1000 of 8000, 12.50%
Location: 1100 of 8000, 13.75%
Location: 1200 of 8000, 15.00%
Location: 1300 of 8000, 16.25%
Location: 1400 of 8000, 17.50%
Location: 1500 of 8000, 18.75%
Location: 1600 of 8000, 20.00%
Location: 1700 of 8000, 21.25%
Location: 1800 of 8000, 22.50%
Location: 1900 of 8000, 23.75%
Location: 2000 of 8000, 25.00%
Location: 2100 of 8000, 26.25%
Location: 2200 of 8000, 27.50%
Location: 2300 of 8000, 28.75%
Location: 2400 of 8000, 30.00%
Location: 2500 of 8000, 31.25%
Location: 2600 of 8000, 32.50%
Location: 2700 of 8000, 33.75%
Location: 2800 of 8000, 35.00%
Location: 2900 of 8000, 36.25%
Location: 3000 of 8000, 37.50%
Location: 3100 of 8000, 38.75%
Location: 3200 of 8000, 40.00%
Location: 3300 of 8000, 41.25%
Location: 3400 of 8000, 42.50%
Location: 3500 of 8000, 43.75%
Location: 3600 of 8000, 45.00%
Location: 3700 of 8000, 46.25%
Location: 3800 of 8000, 47.50%
Location: 3900 of 8000, 48.75%
Location: 4000 of 8000, 50.00%
Location: 4100 of 8000, 51.25%
Location: 4200 of 8000, 52.50%
Location: 4300 of 8000, 53.75%
Location: 4400 of 8000, 55.00%
Location: 4500 of 8000, 56.25%
Location: 4600 of 8000, 57.50%
Location: 4700 of 8000, 58.75%
Location: 4800 of 8000, 60.00%
Location: 4900 of 8000, 61.25%
Location: 5000 of 8000, 62.50%
Location: 5100 of 8000, 63.75%
Location: 5200 of 8000, 65.00%
Location: 5300 of 8000, 66.25%
Location: 5400 of 8000, 67.50%
Location: 5500 of 8000, 68.75%
Location: 5600 of 8000, 70.00%
Location: 5700 of 8000, 71.25%
Location: 5800 of 8000, 72.50%
Location: 5900 of 8000, 73.75%
Location: 6000 of 8000, 75.00%
Location: 6100 of 8000, 76.25%
Location: 6200 of 8000, 77.50%
Location: 6300 of 8000, 78.75%
Location: 6400 of 8000, 80.00%
Location: 6500 of 8000, 81.25%
Location: 6600 of 8000, 82.50%
Location: 6700 of 8000, 83.75%
Location: 6800 of 8000, 85.00%
Location: 6900 of 8000, 86.25%
Location: 7000 of 8000, 87.50%
Location: 7100 of 8000, 88.75%
Location: 7200 of 8000, 90.00%
Location: 7300 of 8000, 91.25%
Location: 7400 of 8000, 92.50%
Location: 7500 of 8000, 93.75%
Location: 7600 of 8000, 95.00%
Location: 7700 of 8000, 96.25%
Location: 7800 of 8000, 97.50%
Location: 7900 of 8000, 98.75%
Location: 8000 of 8000, 100.00%
Generating latent occupancy state
str(out.pred)
List of 6
$ z.0.samples : num [1:1000, 1:10, 1:8000] 1 1 1 1 0 1 1 1 1 1 ...
$ w.0.samples : num [1:1000, 1:2, 1:8000] -1.58 -1.42 -1.05 -1.6 -1.55 ...
$ psi.0.samples: num [1:1000, 1:10, 1:8000] 0.815 0.789 0.866 0.664 0.791 ...
$ run.time : 'proc_time' Named num [1:5] 71.5 25.5 55.7 0 0
..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
$ call : language predict.sfMsPGOcc(object = out.sfMsPGOcc, X.0 = X.0, coords.0 = coords.0)
$ object.class : chr "sfMsPGOcc"
- attr(*, "class")= chr "predict.sfMsPGOcc"
The psi.0.samples component contains the posterior MCMC
samples of occupancy probability for each species across the 8000
locations. Below is some code to generate a very basic species
distribution map for each species.
# Mean occupancy probabilities for each species
psi.0.means <- apply(out.pred$psi.0.samples, c(2, 3), mean)
# Number of species
N <- nrow(psi.0.means)
# Number of prediction locations
J.0 <- ncol(psi.0.means)
plot.df <- data.frame(psi.mean = c(psi.0.means),
x = rep(coords.0[, 1], each = N),
y = rep(coords.0[, 2], each = N),
species = rep(sp.names, times = J.0))
pred.sf <- st_as_sf(plot.df, coords = c('x', 'y'))
ggplot() +
geom_sf(data = pred.sf, aes(color = psi.mean), size = 0.75) +
scale_color_viridis_c() +
theme_bw(base_size = 12) +
facet_wrap(vars(species)) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
legend.position = c(0.75, 0.15)) +
labs(x = "Longitude", y = "Latitude", color = 'Mean Occupancy')
In addition to generating species distribution maps for each species, we can also generate maps of community-level summaries. Below we generate a map of species richness, where we first calculate species richness for each MCMC sample by summing the estimated latent occupancy values for each species at each site.
# Calculate species richness as a derived-quantity of the latent occupancy
# values for each species
rich.samples <- apply(out.pred$z.0.samples, c(1, 3), sum)
# Mean species richness
rich.means <- apply(rich.samples, 2, mean)
# Standard deviation species richness
rich.sds <- apply(rich.samples, 2, sd)
# Create prediction map ---------------
plot.df <- data.frame(rich.mean = rich.means,
rich.sd = rich.sds,
x = coords.0[, 1],
y = coords.0[, 2])
pred.sf <- st_as_sf(plot.df, coords = c('x', 'y'))
rich.mean.plot <- ggplot() +
geom_sf(data = pred.sf, aes(color = rich.mean), size = 0.75) +
scale_color_viridis_c() +
theme_bw(base_size = 15) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(x = "Longitude", y = "Latitude", color = 'Richness Mean')
rich.sd.plot <- ggplot() +
geom_sf(data = pred.sf, aes(color = rich.sd), size = 0.75) +
scale_color_viridis_c() +
theme_bw(base_size = 15) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(x = "Longitude", y = "Latitude", fill = "")
rich.mean.plot
rich.sd.plot
The last two plots we create plot the two spatial factors (or the “missing covariates”) across the predicted locations. Generating maps of these factors, and looking at the estimated species-specific effects of these factors, can give some insight as to the drivers of any residual spatial autocorrelation across the different species.
# Means of first spatial factor
w.1.means <- apply(out.pred$w.0.samples[, 1, ], 2, mean)
# Means of second spatial factor
w.2.means <- apply(out.pred$w.0.samples[, 2, ], 2, mean)
plot.df <- data.frame(w.1.mean = w.1.means,
w.2.mean = w.2.means,
x = coords.0[, 1],
y = coords.0[, 2])
pred.sf <- st_as_sf(plot.df, coords = c('x', 'y'))
w.1.plot <- ggplot() +
geom_sf(data = pred.sf, aes(color = w.1.mean), size = 0.75) +
scale_color_gradient2(midpoint = 0, low = '#B2182B', mid = 'white', high = '#2166AC',
na.value = NA) +
theme_bw(base_size = 15) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(x = "Longitude", y = "Latitude", color = '', title = 'First Spatial Factor')
w.2.plot <- ggplot() +
geom_sf(data = pred.sf, aes(color = w.2.mean), size = 0.75) +
scale_color_gradient2(midpoint = 0, low = '#B2182B', mid = 'white', high = '#2166AC',
na.value = NA) +
theme_bw(base_size = 15) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) +
labs(x = "Longitude", y = "Latitude", color = '', title = 'Second Spatial Factor')
w.1.plot
w.2.plot